Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

STCOR-895 wait a loooong time for a "stale" rotation request #1548

Merged
merged 2 commits into from
Oct 21, 2024
Merged

Conversation

zburke
Copy link
Member

@zburke zburke commented Oct 15, 2024

As part of the RTR lifecycle, we write a rotation timestamp to local storage when the process starts and then remove it when it ends. This is a cheap way of making the rotation request visible across tabs, because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we inspect that timestamp and consider a request "stale" if it's too old. That was the problem here: our "too old" timeout was too short; on a busy server, or on a slow connection, or on a client far from its host (say, in New Zealand), two seconds was not long enough. The rotation request would still be active when stripes considered it "stale", allowing a second request to go through. But since the first request was just slow, not dead, the second one is treated as a token-replay attack by the backend, causing all active sessions for that user account to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895

As part of the RTR lifecyle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895

This comment has been minimized.

Copy link

Jest Unit Test Statistics

178 tests  ±0   178 ✔️ ±0   34s ⏱️ -1s
  25 suites ±0       0 💤 ±0 
    1 files   ±0       0 ±0 

Results for commit f3bb165. ± Comparison against base commit 1e6fc79.

This comment has been minimized.

Copy link

BigTest Unit Test Statistics

    1 files  ±0      1 suites  ±0   10s ⏱️ ±0s
267 tests ±0  261 ✔️ ±0  6 💤 ±0  0 ±0 
270 runs  ±0  264 ✔️ ±0  6 💤 ±0  0 ±0 

Results for commit f3bb165. ± Comparison against base commit 1e6fc79.

Copy link

@zburke zburke merged commit b2083cc into b10.1 Oct 21, 2024
6 checks passed
@zburke zburke deleted the STCOR-895-q branch October 21, 2024 19:12
zburke added a commit that referenced this pull request Oct 22, 2024
As part of the RTR lifecycle, we write a rotation timestamp to local
storage when the process starts and then remove it when it ends. This
is a cheap way of making the rotation request visible across tabs,
because all tabs read the same shared storage.

To avoid the problem of a cancelled request leaving cruft in storage, we
inspect that timestamp and consider a request "stale" if it's too old.
That was the problem here: our "too old" timeout was too short; on a
busy server, or on a slow connection, or on a client far from its host
(say, in New Zealand), two seconds was not long enough. The rotation
request would still be active when stripes considered it "stale",
allowing a second request to go through. But since the first request
was just slow, not dead, the second one is treated as a token-replay
attack by the backend, causing all active sessions for that user account
to be immediately terminated.

Thus, waiting longer is a quick fix. A more detailed approach to
tracking the request is detailed in the code-comments attached to #1547.

Refs STCOR-895

(cherry picked from commit b2083cc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant